Same State, Different Task: Continual Reinforcement Learning without Interference

نویسندگان

چکیده

Continual Learning (CL) considers the problem of training an agent sequentially on a set tasks while seeking to retain performance all previous tasks. A key challenge in CL is catastrophic forgetting, which arises when previously mastered task reduced learning new task. While variety methods exist combat some cases are fundamentally incompatible with each other and thus cannot be learnt by single policy. This can occur, reinforcement (RL) may rewarded for achieving different goals from same observation. In this paper we formalize "interference" as distinct forgetting. We show that existing based neural network predictors shared replay buffers fail presence interference. Instead, propose simple method, OWL, address challenge. OWL learns factorized policy, using feature extraction layers, but separate heads, specializing The heads used prevent At test time, formulate policy selection multi-armed bandit problem, it possible select best unknown feedback environment. use algorithms allows constructively re-use continually policies at times during episode. multiple RL environments fail, able achieve close optimal sequentially.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Continual Reinforcement Learning with Complex Synapses

Unlike humans, who are capable of continual learning over their lifetimes, artificial neural networks have long been known to suffer from a phenomenon known as catastrophic forgetting, whereby new learning can lead to abrupt erasure of previously acquired knowledge. Whereas in a neural network the parameters are typically modelled as scalar values, an individual synapse in the brain comprises a...

متن کامل

Reinforcement Learning without an Explicit Terminal State

| The article introduces a reinforcement learning framework based on dynamic programming for a class of control problems, where no explicit terminal state exists. This situation especially occurs in the context of technical process control: the control task is not terminated once a predeened target value is reached, but instead the controller has to continue to control the system in order to av...

متن کامل

Tuning Continual Exploration in Reinforcement Learning

This paper presents a model allowing to tune continual exploration in an optimal way by integrating exploration and exploitation in a common framework. It first quantifies exploration by defining the degree of exploration of a state as the entropy of the probability distribution for choosing an admissible action. Then, the exploration/exploitation tradeoff is formulated as a global optimization...

متن کامل

Task-Oriented Reinforcement Learning

Acknowledgement This thesis is the result of two years of work whereby I have been accompanied and supported by many people. I am extremely indebted to Dr.

متن کامل

Adaptive State-Space Quantisation and Multi-Task Reinforcement Learning Using . . .

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ژورنال

عنوان ژورنال: Proceedings of the ... AAAI Conference on Artificial Intelligence

سال: 2022

ISSN: ['2159-5399', '2374-3468']

DOI: https://doi.org/10.1609/aaai.v36i7.20674